Maximizing the Area under the ROC Curve using Incremental Reduced Error Pruning
نویسنده
چکیده
The use of incremental reduced error pruning for maximizing the area under the ROC curve (AUC) instead of accuracy is investigated. A commonly used accuracy-based exclusion criterion is shown to include rules that result in concave ROC curves as well as to exclude rules that result in convex ROC curves. A previously proposed exclusion criterion for unordered rule sets, based on the lift, is on the other hand shown to be equivalent to requiring a convex ROC curve when adding a new rule. An empirical evaluation shows that using lift for ordered rule sets leads to a significant improvement. Furthermore, the generation of unordered rule sets is shown to allow for more fine-grained rankings than ordered rule sets, which is confirmed by a significant gain in the empirical evaluation. Eliminating rules that do not have a positive effect on the estimated AUC is shown to slightly improve AUC for ordered rule sets, while no improvement is obtained for unordered rule sets.
منابع مشابه
Uplift Modeling with ROC: An SRL Case Study
Uplift modeling is a classification method that determines the incremental impact of an action on a given population. Uplift modeling aims at maximizing the area under the uplift curve, which is the difference between the subject and control sets’ area under the lift curve. Lift and uplift curves are seldom used outside of the marketing domain, whereas the related ROC curve is frequently used i...
متن کاملComparison of Gestational Diabetes Prediction Between Logistic Regression, Discriminant Analysis, Decision Tree and Artificial Neural Network Models
Background and Objectives: Gestational Diabetes Mellitus (GDM) is the most common metabolic disorder in pregnancy. In case of early detection, some of its complications can be prevented. The aim of this study was to investigate early prediction of GDM by logistic regression (LR), discriminant analysis (DA), decision tree (DT) and perceptron artificial neural network (ANN) and to compare these m...
متن کاملLae-Jeong Park and Jung-Ho Moon A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range
This paper addresses an effective learning method that enables us to directly optimize neural network classifier's discrimination performance at a desired local operating range by maximizing a partial area under a receiver operating characteristic (ROC) or domain-specific curve, which is difficult to achieve with classification accuracy or mean squared error (MSE)-based learning methods. The ef...
متن کاملAnomaly Detection Using SVM as Classifier and Decision Tree for Optimizing Feature Vectors
Abstract- With the advancement and development of computer network technologies, the way for intruders has become smoother; therefore, to detect threats and attacks, the importance of intrusion detection systems (IDS) as one of the key elements of security is increasing. One of the challenges of intrusion detection systems is managing of the large amount of network traffic features. Removing un...
متن کاملRisk Estimation by Maximizing the Area under ROC Curve
Risks exist in many different domains; medical diagnoses, financial markets, fraud detection and insurance policies are some examples. Various risk measures and risk estimation systems have hitherto been proposed and this paper suggests a new risk estimation method. Risk estimation by maximizing the area under a receiver operating characteristics (ROC) curve (REMARC) defines risk estimation as ...
متن کامل